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Mesorhizobium australicum strain WSM2073 1 was isolated from root nodules on the pasture legume 
Biserrula pelecinus growing in Australia in 2000. This aerobic, motile, gram negative, non- 
spore-forming rod is poorly effective in N 2 fixation on B. pelecinus and has gained the ability to 
nodulate B. pelecinus following in situ lateral transfer of a symbiosis island from the original in- 
oculant strain for this legume, Mesorhizobium ciceribv. biserrulae WSM1271 . We describe that 
the genome size of M. australicum strain WSM2073 1 is 6,200,534 bp encoding 6,013 protein- 
coding genes and 67 RNA-only encoding genes. This genome does not contain any plasmids 
but has a 455.7 kb genomic island from Mesorhizobium ciceribv. biserrulae WSM1271 that 
has been integrated into a phenylalanine-tRNA gene. 



Introduction 

Biological nitrogen fixation (BNF) contributes 
substantially to the productivity of sustainable 
agriculture around the world and approximately 
80% of biologically fixed nitrogen (N) is estimated 
to be contributed by the symbiotic association be- 
tween root nodule bacteria (RNB) and leguminous 
plants [1]. This process of symbiotic nitrogen fixa- 
tion (SNF) enables 175 million tons of atmospher- 
ic nitrogen (N2) to be fixed each year into a plant 
available form. SNF therefore reduces the need to 
apply fertilizer to provide bioavailable nitrogen, 
decreases greenhouse gas emissions derived from 
fertilizer manufacture, alleviates chemical leach- 
ing into the environment from the over applica- 
tion of fertilizer, and substantially enhances soil 
nitrogen for crop and animal production [2-4]. 
Because of substantial SNF benefits, considerable 
effort has been devoted to sourcing legumes from 
different geographical locations to improve leg- 
ume productivity in different agricultural settings 
T31- 



The Mediterranean legume Biserrula pelecinus L. 
is one of only three deep rooted annual legume 
species widely used in commerce with the poten- 
tial to reduce the development of dryland salinity 
in Australia and was therefore introduced into 
Australia in 1994. Native RNB in Australian soil 
were not capable of nodulating B. pelecinus and 
therefore this host was inoculated with the inocu- 
lant strain Mesorhizobium ciceri bv. biserrulae 
WSM1271 [5] to obtain an effective symbiosis. Six 
years after the introduction of this legume into 
Western Australia, isolates were recovered from 
root nodules on B. pelecinus growing in Northam, 
Western Australia that were compromised in their 
nitrogen fixation capacity. The gradual replace- 
ment of the inoculant by established strains of 
RNB that are competitive for nodulation but 
suboptimal in N2 fixation threatens the successful 
establishment of this new legume in agriculture 
[6]. 



The Genomic Standards Consortium 



Reeve et al. 



One of these poorly effective but competitive 
strains that was isolated from a nodule of B. 
pelecinus grown in the wheat belt of Western 
Australia can only fix <40% N2 compared to the 
original inoculant M. ciceri bv. biserrulae 
WSM1271. This strain has been designated as 
WSM2073T (= LMG 24608 = HAMBI 3006) and is 
now the recognized type strain for the species 
Mesorhizobium australicum [7]. The species name 
au.stra.li'cum. N.L. neut. adj. australicum is in ref- 
erence to where this isolate originated from [7] 
and represents a dominant chromosomal type 
strain surviving as a soil saprophyte in the West- 
ern Australian wheat belt [6,8] that appears to 
have the capacity to acquire symbiotic genes 
through horizontal transfer [9]. 

In this report we present a summary classifica- 
tion and a set of general features for M. 
australicum strain WSM2073 T together with the 
description of the complete genome sequence 
and its annotation. Here we reveal that a 455.7 
Kb genomic island from the inoculant 
Mesorhizobium ciceri bv. biserrulae WSM1271 
has been horizontally transferred into M. 
australicum strain WSM2073 T and integrated in- 
to the phenylalanine-tRNA gene. 

Classification and features 

M. australicum strain WSM2073 T is a motile, 
gram negative, no n- spore-forming rod (Figure 1 
Left and Center) in the order Rhizobiales of the 
class Alphaproteobacteria. They are moderately 
fast growing, forming 2-4 mm diameter colonies 



within 3-4 days and have a mean generation time 
of 4 - 6 h when grown in half Lupin Agar {VzhN) 
broth [10] at 28 °C. Colonies on %LA are white- 
opaque, slightly domed, moderately mucoid with 
smooth margins (Figure 1 Right). Strains of this 
organism are able to tolerate a pH range between 
5.5 and 9.0. More information on the carbon 
source utilization and fatty acid profiles were 
described before [7]. Minimum Information 
about a Genome Sequence (MIGS) is given in Ta- 
ble 1. 

Figure 2 shows the phylogenetic neighborhood of 
M. australicum strain WSM2073T in a 16S rRNA 
sequence based tree. This strain clustered in a 
tight group which included M. shangrilense, M. 
loti and M. ciceri and had >99% sequence similar- 
ity with all four type strains. However, based on a 
polyphasic taxonomic study we have identified 
this strain to belong to a new species [7]. 

Symbiotaxonomy 

M. australicum strain WSM2073 1 has an extremely 
narrow legume host range for symbiosis only 
forming partially effective nitrogen-fixing root 
nodules on Biserrula pelecinus L [6]. This strain 
also nodulates the closely related species 
Astragalus membranaceus but does not nodulate 
21 other legume species nodulated by 
Mesorhizobium spp. [6]. Strain WSM2073 T has 
similar highly specific symbiotic nodulation capa- 
bilities to M. ciceri bv. biserrulae WSM1271, but is 
a poor N-fixer on B. pelecinus L. 
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Table 1. Classification and general features of M. australtum strain VVSM2073 1 according to the MIGS recommenda- 
tions [11]. 



MIGS ID 


Property 


Term 


Evidence code 






Domain Bacteria 


TAS 


L 1 ^ J 






Phylum Proteobacteria 


tas 


L 1 J\ 






Class Alphaproteobacteria 


TAS 


[14,15] 




Current classification 


Order Rhizobiales 


TAS 


[15,16] 






Family Phyllobacteriaceae 


TAS 


n 5 1 71 






Genus Mesorhizobium 


TAS 


[181 






Species Mesorhizobium australicum 


TAS 


U J 




Gram stain 


Negative 


TAS 


[71 

L' J 




Cell shape 


Rod 


TAS 


171 




Motility 


Motile 


TAS 


[71 




Sporulation 


Non-sporulating 


TA Q 
1 AO 


n qi 




Temperature range 


Mesophile 


TAS 


L 1 -'J 




Optimum temperature 


28°C 


TAS 


[71 




Salinity 


Unknown 


MAS 




MIGS-22 


Oxygen requirement 


Aerobic 


TAS 


[1 Ql 




Carbon source 


Arabinose, gentibiose, glucose, mannitol & melibiose 


TAS 


[7] 




Energy source 


Chemoorganotroph 


TAS 


11 91 


MIGS-6 


Habitat 


Soil, root nodule, host 


TAS 


171 


MIGS-15 


Biotic relationship 


Free living, Symbiotic 


TAS 


171 


MIGS-14 


Pathogenicity 


None 


NAS 


11 91 




Biosafety level 


1 


TAS 


yz uj 




Isolation 


Root nodule of Biserrula pelecinus. L 


TAS 


r7i 


MIGS-4 


Geographic location 


Northam, Western Australia 


TAS 


[61 


MIGS-5 


Nodule collection date 


August 2 000 


TAS 


16] 


MIGS-4.1 


Longitude 


116.947875 


TAS 


[6] 


MIGS-4.2 


Latitude 


-31.530408 


TAS 


16] 


MIGS-4.3 


Depth 


10 cm 


IDA 




MIGS-4.4 


Altitude 


160 m 


IDA 





Evidence codes - TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable 
Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted prop- 
erty for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [21]. If the 
evidence code is IDA, then the property was directly observed by one of the authors or an expert mentioned in the 
acknowledgements. 
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86 j Meaorhizobium loti R7A (Gi08825) 
84 J Mesorhizobium loti MAFF303099 (Gc0O040f 

63 L Mesorhizobium opportunistum WSM2075 1 " (NR 074209. Gc01853) 

59 

Mesorhizobium loti NZP2037 (Gi08826) 
-Mesorhizobium huakuii LMG1 41 07 T (D 13431) 
-Meaorhizobium plurifarium LMG 11892 T (Y14158) 
Mesorhizobium australictim WSM2073 T (CP003358. Gc02468) 
-Meaorhizobium aeptentrionale CCBAU 11014 T (AF508207) 
Mesorhizobium amorphae ACCC 19B65 T (AF041442) 
Meaorhizobium mediterraneum LMG 17148 T (AM181745) 
Mesorhizobium temperatum CCBAU 11018 T (AF508208) 
I— Mesorhizobium robiniae CCNWYC 115 T (EU849582) 
Meaorhizobium caraganae CCBAU 1 1 299 T (EF1 49003) 
Mesorhizobium tarimense CCBAU 83306 T (EF035058) 
Mesorhizobium metallidurana STM 2683 T (AM930381) 
Mesorhizobium gobiense CCBAU 83330 T (EF0350B4) 
Mesorhizobium tianshanense USD A 3592 T (AF041447) 
— Meaorhizobium chacoense Pr5 T (AJ 278249) 

100 i- Mesorhizobium alhagi CCNWXJ12-2 T (EU1 69578) 



92 
84 



to 



Mesorhizobium camelthorni C C NWX J 40-4 T (EU1 69581) 
Mesorhizobium ciceri bv . biserru/ae WSM1271 (Gc01578) 
Mesorhizobium ciceri NBRC 100389 1 " (AB681164) 
Meaorhizobium loti LMG 61 25 T (X67229 . Gi08881) 
Mesorhizobium shangrilense CCBAU 65327 1 " (EU074203) 
Meaorhizobium loti CJ3sym (Gi08828) 
Mesorhizobium loti R88B (Gi08827) 
Mesorhizobium albiziae CCBAU 61158 T (DQ1 00066) 
Enaifer frecfii LMG 621 7 T (X 67231 ) 
Ensifer meliloti LMG 61 33 T (X 67222) 



95 



7 2 



Ensifer terangae LMG 7S:34 T (XE.R3R:3) 

Ensifer hostiensis LMG 19920 T (AM181748) 

Rhizobium galegae LMG 621 4 T (X67226. Gi09589 ) 

— Rhizobium mongolense USD A 1 844T (Gi08900 U89817) 

Rhizobium tropici C[A.T899 T (EU488752, Gi05744)* 



96 1 Rhizobium leguminosarum bv. viciae USD A 2370 T (U29386, GD6483) 

-Azorhizobium caulinocfans ORS571 T (D11342) 

52 | Brady rhizobium liaoningenae LMG18230 1 " (AJ 2508 13) 



99 



Brady rhizobium yuanmingense LMG 21 827 T (AF1 9381 8) 

C^-ady rhizobium japonicum USD A 6 T (Gc02045, U69638) 
Brady rhizobium canarienae LMG 22265 T (AJ 558025) 
- Brady rhizobium elkanii U S D A 76 T (AB509378 , Gi08850) 

100 | Methylobacterium nodulana ORS 2060 T (AF220763. Gc00935)* 

I Methylobacterium sp. WSM2598 (Gi08887) 



— Microvirga zsmKensi's WSM3693 T (HM362433) 

Microvirga lupini Lut6 T (EF191408. Gi06478) 

Microvirga lotononidis WSM3557 T (HM362432. 



Gi06493) 



Figure 2. Phylogenetic tree showing the relationships of M. australicum strain WSM2073 1 with some of the root 
nodule bacteria in the order Rhizobiales based on aligned sequences of the 16S rRNA gene (1,290 bp internal re- 
gion). All sites were informative and there were no gap-containing sites. Phylogenetic analyses were performed us- 
ing MEGA [22]. The tree was built using the Maximum-Likelihood method with the General Time Reversible mod- 
el. Bootstrap analysis [23] was performed to assess the support of the clusters. Type strains are indicated with a su- 
perscript T. Brackets after the strain name contain a DNA database accession number and/or a GOLD ID (beginning 
with the prefix G) for a sequencing project registered in GOLD [24] Published genomes are indicated with an aster- 
isk. 
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Table 2. Genome sequencing project information for Mesorhizobium australicum strain WSM2073 1 



MIGS ID Property 



Term 



MIGS-31 


Finishing quality 


Finished 


MIGS-28 


Libraries used 


lllumina GAM shotgun library, 454 Titanium standard library and paired end 454 
libraries 


MIGS-29 


Sequencing platforms 


lllumina and 454 technologies 


MIGS-31 .2 


Sequencing coverage 


454 standard and paired end (28x) and lllumina (2 1 59x); total 2 1 87x 


MIGS-30 


Assemblers 


Newblerv2.3 and Velvet v 0.7.63, PHRAP SPS-4.24and CONSED 


MIGS-32 


Gene calling method 


Prodigal v.2.50, GenePrimp 




Genbank ID 


CP003358 




Genbank Date of Release 


December 28, 2012 




GOLD ID 


Gc02468 




NCBI project ID 


47287 




Database: IMG 


2 5 092 7 6022 




Project relevance 


Symbiotic nitrogen fixation, agriculture 



Genome sequencing and annotation 

Genome project history 

This organism was selected for sequencing on the 
basis of its environmental and agricultural rele- 
vance to issues in global carbon cycling, alterna- 
tive energy production, and biogeochemical im- 
portance, and is part of the Community Sequenc- 
ing Program at the US Department of Energy Joint 
Genome Institute (JGI) for projects of relevance to 
agency missions. The genome project is deposited 
in the Genomes OnLine Database [24] and the 
complete genome sequence in GenBank. Sequenc- 
ing, finishing and annotation were performed by 
the DOE Joint Genome Institute QGI). A summary 
of the project information is shown in Table 2. 

Growth conditions and DNA isolation 

M. australicum strain WSM2073 T was grown to 
mid logarithmic phase in TY medium (a rich me- 
dium) [25] on a gyratory shaker at 28°C. DNA was 
isolated from 60 mL of cells using a CTAB (Cetyl 
trimethylammonium bromide) bacterial genomic 
DNA isolation method. 

Genome sequencing and assembly 

The draft genome of M. australicum strain 
WSM2073 T was generated at the DOE Joint genome 
Institute QGI) using a combination of lllumina [26] 
and 454 technologies [27]. For this, genome we 
constructed and sequenced an lllumina GAii shot- 
gun library which generated 10,509,788 reads to- 
taling 378.4 Mb, a 454 Titanium standard library 
which generated 235,807 reads and paired end 454 
libraries with an average insert sizes of 263 Kb 
/10.9 Kb which generated 221,877/139,171 reads 
totaling 257.0 Mb of 454 data. All general aspects of 
library construction and sequencing performed in 



this project can be found at the DOE Joint Genome 
Institute website. The initial draft assembly con- 
tained 14 contigs in 1 scaffold. The 454 Titanium 
standard data and the 454 paired end data were 
assembled together with Newbler, version 2.3. The 
Newbler consensus sequences were computation- 
ally shredded into 2 Kb overlapping fake reads 
(shreds). lllumina sequencing data was assembled 
with VELVET, version 0.7.63 [28], and the consen- 
sus sequences were computationally shredded into 
1.5 Kb overlapping fake reads (shreds). We inte- 
grated the 454 Newbler consensus shreds, the 
lllumina VELVET consensus shreds and the read 
pairs in the 454 paired end library using parallel 
phrap, version SPS - 4.24 (High Performance Soft- 
ware, LLC). The software Consed [29-31] was used 
in the following finishing process. lllumina data 
was used to correct potential base errors and in- 
crease consensus quality using the software Polish- 
er developed at JGI (Alia Lapidus, unpublished). 
Possible mis-assemblies were corrected using 
gapResolution (Cliff Han, unpublished), Dupfinisher 
[32], or sequencing cloned bridging PCR fragments 
with subcloning. Gaps between contigs were closed 
by editing in Consed, by PCR and by Bubble PCR (J- 
F Cheng, unpublished) primer walks. A total of 59 
additional reactions were necessary to close gaps 
and to raise the quality of the finished sequence. 
The total size of the genome is 6,200,534 bp and 
the final assembly is based on 257 Mb of 454 draft 
data which provides an average 28* coverage of 
the genome and 13,385 Mb of lllumina draft data 
which provides an average 2159* coverage of the 
genome. 
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Genome annotation 

Genes were identified using Prodigal [33] as part 
of the Oak Ridge National Laboratory genome an- 
notation pipeline, followed by a round of manual 
curation using the JGI GenePrimp pipeline [34]. 
The predicted CDSs were translated and used to 
search the National Center for Biotechnology In- 
formation (NCBI) non-redundant database, 
UniProt, TIGRFam, Pfam, PRIAM, KEGG, COG, and 
InterPro databases. These data sources were 
combined to assert a product description for each 
predicted protein. Non-coding genes and miscel- 
laneous features were predicted using tRNAscan- 
SE [35], RNAMMer [36], Rfam [37], TMHMM [38], 
and SignalP [39]. Additional gene prediction anal- 
yses and functional annotation were performed 



within the Integrated Microbial Genomes (IMG- 
ER) platform [40]. 

Genome properties 

The genome is 6,200,534 bp long with a 62.84% 
GC content (Table 3, Figure 3) and comprised of a 
single chromosome. From all the genes present in 
the genome, 6,013 were protein coding genes and 
67 RNA only encoding genes. Two hundred and 
twenty one pseudogenes were also identified. The 
majority of protein coding genes (4,875; 80.18%) 
were assigned a putative function whilst the re- 
maining protein coding genes were annotated as 
encoding hypothetical proteins. The distribution 
of genes into COGs functional categories is pre- 
sented in Table 4. 



Table 3. Genome Statistics for Mesorhizobium australicum strain WSM2073 1 



Attribute 



Value % of Total 



Genome size (bp) 



DNA coding region (bp) 



DNA G+C content (bp) 



Number of replicons 



Extrachromosomal elements 



Total genes 



RNA genes 



Protein-coding genes 



Genes with function prediction 



Genes assigned to COGs 



Genes assigned Pfam domains 



Genes with signal peptides 



Genes with transmembrane helices 



6,200,534 
5,371,783 
3,896,642 
1 
0 

6,080 
67 
6,013 
4,875 
4,877 
5,082 
536 
1,434 



100 
86.63 
62.84 

100 



100 
1.1 

98.9 
80.18 
80.21 
83.40 

8.82 
23.59 
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Figure 3. Graphical circular map of the chromosome of Mesorhizobium australtum WSM2073 1 . From outside to 
the center: Genes on forward strand (color by COG categories as denoted by the IMG platform), Genes on reverse 
strand (color by COG categories), RNA genes (tRNAs green, sRNAs red, other RNAs black), GC content, GC skew. 
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Table 4. Number of protein coding genes of Mesorhizobium australicum WSM2073 
associated with the general COG functional categories. 



Code 


Value 


%age 


Description 


] 


192 


3.56 


Translation, ribosomal structure and biogenesis 


A 


1 


0.02 


RNA processing and modification 


K 


450 


8.34 


Transcription 


L 


179 


3.32 


Replication, recombination and repair 


B 


5 


0.09 


Chromatin structure and dynamics 


D 


35 


0.65 


Cell cycle control, mitosis and meiosis 


Y 


0 


0.00 


Nuclear structure 


V 


60 


1.11 


Defense mechanisms 


T 


214 


3.96 


Signal transduction mechanisms 


M 


305 


5.65 


Cell wall/membrane biogenesis 


N 


42 


0.78 


Cell motility 


Z 


0 


0.00 


Cytoskeleton 


w 


1 


0.02 


Extracellular structures 


u 


115 


2.13 


I ntracel I ular trafficki ng and secretion 


o 


180 


3.33 


Posttranslational modification, protein turnover, chaperones 


c 


302 


5.59 


Energy production conversion 


G 


511 


9.47 


Carbohydrate transport and metabolism 


E 


634 


11.75 


Amino acid transport metabolism 


F 


94 


1.74 


Nucleotide transport and metabolism 


H 


201 


3.72 


Coenzyme transport and metabolism 


1 


216 


4.00 


Lipid transport and metabolism 


P 


239 


4.43 


Inorganic ion transport and metabolism 


Q 


156 


2.89 


Secondary metabolite biosynthesis, transport and catabolism 

/ / ' 1 


R 


699 


12.95 


General function prediction only 


S 


567 


10.50 


Function unknown 




1203 


19.79 


Not in COGS 


Total 


5,748 
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